22 research outputs found

    NCBI GEO: mining millions of expression profiles—database and tools

    Get PDF
    The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest fully public repository for high-throughput molecular abundance data, primarily gene expression data. The database has a flexible and open design that allows the submission, storage and retrieval of many data types. These data include microarray-based experiments measuring the abundance of mRNA, genomic DNA and protein molecules, as well as non-array-based technologies such as serial analysis of gene expression (SAGE) and mass spectrometry proteomic technology. GEO currently holds over 30 000 submissions representing approximately half a billion individual molecular abundance measurements, for over 100 organisms. Here, we describe recent database developments that facilitate effective mining and visualization of these data. Features are provided to examine data from both experiment- and gene-centric perspectives using user-friendly Web-based interfaces accessible to those without computational or microarray-related analytical expertise. The GEO database is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo

    NCBI Peptidome: a new repository for mass spectrometry proteomics data

    Get PDF
    Peptidome is a public repository that archives and freely distributes tandem mass spectrometry peptide and protein identification data generated by the scientific community. Data from all stages of a mass spectrometry experiment are captured, including original mass spectra files, experimental metadata and conclusion-level results. The submission process is facilitated through acceptance of data in commonly used open formats, and all submissions undergo syntactic validation and curation in an effort to uphold data integrity and quality. Peptidome is not restricted to specific organisms, instruments or experiment types; data from any tandem mass spectrometry experiment from any species are accepted. In addition to data storage, web-based interfaces are available to help users query, browse and explore individual peptides, proteins or entire Samples and Studies. Results are integrated and linked with other NCBI resources to ensure dissemination of the information beyond the mass spectroscopy proteomics community. Peptidome is freely accessible at http://www.ncbi.nlm.nih.gov/peptidome

    The Fourteenth Data Release of the Sloan Digital Sky Survey: First Spectroscopic Data from the extended Baryon Oscillation Spectroscopic Survey and from the second phase of the Apache Point Observatory Galactic Evolution Experiment

    Get PDF
    The fourth generation of the Sloan Digital Sky Survey (SDSS-IV) has been in operation since July 2014. This paper describes the second data release from this phase, and the fourteenth from SDSS overall (making this, Data Release Fourteen or DR14). This release makes public data taken by SDSS-IV in its first two years of operation (July 2014-2016). Like all previous SDSS releases, DR14 is cumulative, including the most recent reductions and calibrations of all data taken by SDSS since the first phase began operations in 2000. New in DR14 is the first public release of data from the extended Baryon Oscillation Spectroscopic Survey (eBOSS); the first data from the second phase of the Apache Point Observatory (APO) Galactic Evolution Experiment (APOGEE-2), including stellar parameter estimates from an innovative data driven machine learning algorithm known as "The Cannon"; and almost twice as many data cubes from the Mapping Nearby Galaxies at APO (MaNGA) survey as were in the previous release (N = 2812 in total). This paper describes the location and format of the publicly available data from SDSS-IV surveys. We provide references to the important technical papers describing how these data have been taken (both targeting and observation details) and processed for scientific use. The SDSS website (www.sdss.org) has been updated for this release, and provides links to data downloads, as well as tutorials and examples of data use. SDSS-IV is planning to continue to collect astronomical data until 2020, and will be followed by SDSS-V.Comment: SDSS-IV collaboration alphabetical author data release paper. DR14 happened on 31st July 2017. 19 pages, 5 figures. Accepted by ApJS on 28th Nov 2017 (this is the "post-print" and "post-proofs" version; minor corrections only from v1, and most of errors found in proofs corrected

    A taxonomy of epithelial human cancer and their metastases

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray technology has allowed to molecularly characterize many different cancer sites. This technology has the potential to individualize therapy and to discover new drug targets. However, due to technological differences and issues in standardized sample collection no study has evaluated the molecular profile of epithelial human cancer in a large number of samples and tissues. Additionally, it has not yet been extensively investigated whether metastases resemble their tissue of origin or tissue of destination.</p> <p>Methods</p> <p>We studied the expression profiles of a series of 1566 primary and 178 metastases by unsupervised hierarchical clustering. The clustering profile was subsequently investigated and correlated with clinico-pathological data. Statistical enrichment of clinico-pathological annotations of groups of samples was investigated using Fisher exact test. Gene set enrichment analysis (GSEA) and DAVID functional enrichment analysis were used to investigate the molecular pathways. Kaplan-Meier survival analysis and log-rank tests were used to investigate prognostic significance of gene signatures.</p> <p>Results</p> <p>Large clusters corresponding to breast, gastrointestinal, ovarian and kidney primary tissues emerged from the data. Chromophobe renal cell carcinoma clustered together with follicular differentiated thyroid carcinoma, which supports recent morphological descriptions of thyroid follicular carcinoma-like tumors in the kidney and suggests that they represent a subtype of chromophobe carcinoma. We also found an expression signature identifying primary tumors of squamous cell histology in multiple tissues. Next, a subset of ovarian tumors enriched with endometrioid histology clustered together with endometrium tumors, confirming that they share their etiopathogenesis, which strongly differs from serous ovarian tumors. In addition, the clustering of colon and breast tumors correlated with clinico-pathological characteristics. Moreover, a signature was developed based on our unsupervised clustering of breast tumors and this was predictive for disease-specific survival in three independent studies. Next, the metastases from ovarian, breast, lung and vulva cluster with their tissue of origin while metastases from colon showed a bimodal distribution. A significant part clusters with tissue of origin while the remaining tumors cluster with the tissue of destination.</p> <p>Conclusion</p> <p>Our molecular taxonomy of epithelial human cancer indicates surprising correlations over tissues. This may have a significant impact on the classification of many cancer sites and may guide pathologists, both in research and daily practice. Moreover, these results based on unsupervised analysis yielded a signature predictive of clinical outcome in breast cancer. Additionally, we hypothesize that metastases from gastrointestinal origin either remember their tissue of origin or adapt to the tissue of destination. More specifically, colon metastases in the liver show strong evidence for such a bimodal tissue specific profile.</p

    The 13th Data Release of the Sloan Digital Sky Survey: First Spectroscopic Data from the SDSS-IV Survey Mapping Nearby Galaxies at Apache Point Observatory

    Get PDF
    The fourth generation of the Sloan Digital Sky Survey (SDSS-IV) began observations in July 2014. It pursues three core programs: APOGEE-2,MaNGA, and eBOSS. In addition, eBOSS contains two major subprograms: TDSS and SPIDERS. This paper describes the first data release from SDSS-IV, Data Release 13 (DR13), which contains new data, reanalysis of existing data sets and, like all SDSS data releases, is inclusive of previously released data. DR13 makes publicly available 1390 spatially resolved integral field unit observations of nearby galaxies from MaNGA,the first data released from this survey. It includes new observations from eBOSS, completing SEQUELS. In addition to targeting galaxies and quasars, SEQUELS also targeted variability-selected objects from TDSS and X-ray selected objects from SPIDERS. DR13 includes new reductions ofthe SDSS-III BOSS data, improving the spectrophotometric calibration and redshift classification. DR13 releases new reductions of the APOGEE-1data from SDSS-III, with abundances of elements not previously included and improved stellar parameters for dwarf stars and cooler stars. For the SDSS imaging data, DR13 provides new, more robust and precise photometric calibrations. Several value-added catalogs are being released in tandem with DR13, in particular target catalogs relevant for eBOSS, TDSS, and SPIDERS, and an updated red-clump catalog for APOGEE.This paper describes the location and format of the data now publicly available, as well as providing references to the important technical papers that describe the targeting, observing, and data reduction. The SDSS website, http://www.sdss.org, provides links to the data, tutorials and examples of data access, and extensive documentation of the reduction and analysis procedures. DR13 is the first of a scheduled set that will contain new data and analyses from the planned ~6-year operations of SDSS-IV.PostprintPeer reviewe

    The 16th Data Release of the Sloan Digital Sky Surveys: First Release from the APOGEE-2 Southern Survey and Full Release of eBOSS Spectra

    Get PDF
    This paper documents the 16th data release (DR16) from the Sloan Digital Sky Surveys (SDSS), the fourth and penultimate from the fourth phase (SDSS-IV). This is the first release of data from the Southern Hemisphere survey of the Apache Point Observatory Galactic Evolution Experiment 2 (APOGEE-2); new data from APOGEE-2 North are also included. DR16 is also notable as the final data release for the main cosmological program of the Extended Baryon Oscillation Spectroscopic Survey (eBOSS), and all raw and reduced spectra from that project are released here. DR16 also includes all the data from the Time Domain Spectroscopic Survey and new data from the SPectroscopic IDentification of ERosita Survey programs, both of which were co-observed on eBOSS plates. DR16 has no new data from the Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey (or the MaNGA Stellar Library "MaStar"). We also preview future SDSS-V operations (due to start in 2020), and summarize plans for the final SDSS-IV data release (DR17)

    The 16th Data Release of the Sloan Digital Sky Surveys: First Release from the APOGEE-2 Southern Survey and Full Release of eBOSS Spectra

    Get PDF
    This paper documents the 16th data release (DR16) from the Sloan Digital Sky Surveys (SDSS), the fourth and penultimate from the fourth phase (SDSS-IV). This is the first release of data from the Southern Hemisphere survey of the Apache Point Observatory Galactic Evolution Experiment 2 (APOGEE-2); new data from APOGEE-2 North are also included. DR16 is also notable as the final data release for the main cosmological program of the Extended Baryon Oscillation Spectroscopic Survey (eBOSS), and all raw and reduced spectra from that project are released here. DR16 also includes all the data from the Time Domain Spectroscopic Survey and new data from the SPectroscopic IDentification of ERosita Survey programs, both of which were co-observed on eBOSS plates. DR16 has no new data from the Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey (or the MaNGA Stellar Library "MaStar"). We also preview future SDSS-V operations (due to start in 2020), and summarize plans for the final SDSS-IV data release (DR17)

    The 16th Data Release of the Sloan Digital Sky Surveys : First Release from the APOGEE-2 Southern Survey and Full Release of eBOSS Spectra

    Get PDF
    This paper documents the 16th data release (DR16) from the Sloan Digital Sky Surveys (SDSS), the fourth and penultimate from the fourth phase (SDSS-IV). This is the first release of data from the Southern Hemisphere survey of the Apache Point Observatory Galactic Evolution Experiment 2 (APOGEE-2); new data from APOGEE-2 North are also included. DR16 is also notable as the final data release for the main cosmological program of the Extended Baryon Oscillation Spectroscopic Survey (eBOSS), and all raw and reduced spectra from that project are released here. DR16 also includes all the data from the Time Domain Spectroscopic Survey and new data from the SPectroscopic IDentification of ERosita Survey programs, both of which were co-observed on eBOSS plates. DR16 has no new data from the Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey (or the MaNGA Stellar Library "MaStar"). We also preview future SDSS-V operations (due to start in 2020), and summarize plans for the final SDSS-IV data release (DR17).Peer reviewe

    The Eleventh and Twelfth Data Releases of the Sloan Digital Sky Survey: Final Data from SDSS-III

    Get PDF
    The third generation of the Sloan Digital Sky Survey (SDSS-III) took data from 2008 to 2014 using the original SDSS wide-field imager, the original and an upgraded multi-object fiber-fed optical spectrograph, a new near-infrared high-resolution spectrograph, and a novel optical interferometer. All of the data from SDSS-III are now made public. In particular, this paper describes Data Release 11 (DR11) including all data acquired through 2013 July, and Data Release 12 (DR12) adding data acquired through 2014 July (including all data included in previous data releases), marking the end of SDSS-III observing. Relative to our previous public release (DR10), DR12 adds one million new spectra of galaxies and quasars from the Baryon Oscillation Spectroscopic Survey (BOSS) over an additional 3000 deg2 of sky, more than triples the number of H-band spectra of stars as part of the Apache Point Observatory (APO) Galactic Evolution Experiment (APOGEE), and includes repeated accurate radial velocity measurements of 5500 stars from the Multi-object APO Radial Velocity Exoplanet Large-area Survey (MARVELS). The APOGEE outputs now include the measured abundances of 15 different elements for each star. In total, SDSS-III added 5200 deg2 of ugriz imaging; 155,520 spectra of 138,099 stars as part of the Sloan Exploration of Galactic Understanding and Evolution 2 (SEGUE-2) survey; 2,497,484 BOSS spectra of 1,372,737 galaxies, 294,512 quasars, and 247,216 stars over 9376 deg2; 618,080 APOGEE spectra of 156,593 stars; and 197,040 MARVELS spectra of 5513 stars. Since its first light in 1998, SDSS has imaged over 1/3 of the Celestial sphere in five bands and obtained over five million astronomical spectra. \ua9 2015. The American Astronomical Society
    corecore